28 research outputs found

    Learning a Complete Image Indexing Pipeline

    Full text link
    To work at scale, a complete image indexing system comprises two components: An inverted file index to restrict the actual search to only a subset that should contain most of the items relevant to the query; An approximate distance computation mechanism to rapidly scan these lists. While supervised deep learning has recently enabled improvements to the latter, the former continues to be based on unsupervised clustering in the literature. In this work, we propose a first system that learns both components within a unifying neural framework of structured binary encoding

    Hybrid multi-layer Deep CNN/Aggregator feature for image classification

    Full text link
    Deep Convolutional Neural Networks (DCNN) have established a remarkable performance benchmark in the field of image classification, displacing classical approaches based on hand-tailored aggregations of local descriptors. Yet DCNNs impose high computational burdens both at training and at testing time, and training them requires collecting and annotating large amounts of training data. Supervised adaptation methods have been proposed in the literature that partially re-learn a transferred DCNN structure from a new target dataset. Yet these require expensive bounding-box annotations and are still computationally expensive to learn. In this paper, we address these shortcomings of DCNN adaptation schemes by proposing a hybrid approach that combines conventional, unsupervised aggregators such as Bag-of-Words (BoW), with the DCNN pipeline by treating the output of intermediate layers as densely extracted local descriptors. We test a variant of our approach that uses only intermediate DCNN layers on the standard PASCAL VOC 2007 dataset and show performance significantly higher than the standard BoW model and comparable to Fisher vector aggregation but with a feature that is 150 times smaller. A second variant of our approach that includes the fully connected DCNN layers significantly outperforms Fisher vector schemes and performs comparably to DCNN approaches adapted to Pascal VOC 2007, yet at only a small fraction of the training and testing cost.Comment: Accepted in ICASSP 2015 conference, 5 pages including reference, 4 figures and 2 table

    Kernel Square-Loss Exemplar Machines for Image Retrieval

    Get PDF
    International audienceZepeda and PĂ©rez have recently demonstrated the promise of the exemplar SVM (ESVM) as a feature encoder for image retrieval. This paper extends this approach in several directions: We first show that replacing the hinge loss by the square loss in the ESVM cost function significantly reduces encoding time with negligible effect on accuracy. We call this model square-loss exemplar machine, or SLEM. We then introduce a kernelized SLEM which can be implemented efficiently through low-rank matrix decomposition , and displays improved performance. Both SLEM variants exploit the fact that the negative examples are fixed, so most of the SLEM computational complexity is relegated to an offline process independent of the positive examples. Our experiments establish the performance and computational advantages of our approach using a large array of base features and standard image retrieval datasets

    Nouvelles méthodes de représentations parcimonieuses ; application à la compression et l'indexation d'images

    No full text
    A new dictionary structure is introduced called an Iteration-Tuned Dictionary (ITD). ITDs are layered structures containing a set of candidate dictionaries in each layer. ITD-based iterative pursuit decompositions are carried out using, at each iteration i, one of the candidates from the i-th layer. A general ITD framework is proposed as well as a tree-structured variant called the Tree-Structured Iteration-Tuned Dictionary (TSITD) and a constrained tree-structured variant called the Iteration-Tuned and Aligned Dictionary (ITAD). These structures are shown to outperform various state-of-the-art reference algorithms in their ability to approximate a dataset sparsely, and in the applications of image denoising and image compression. The ITAD scheme, in particular, is used to develop an image codec that outperforms JPEG2000. An approximate vector search method is also introduced which uses sparse representations to carry out low-complexity approximate nearest-neighbor image searches. The approach addresses the related instability of the sparse support when the image patch is subject to weak affine transformations. In developing this new approach, a new data conditioning scheme is introduced that succeeds in better distributing data on the unit sphere while preserving relative angles. It is shown that this new approach improves the complexity/performance tradeoff of approximate searches based on sparse representations.Une nouvelle structure de dictionnaire adaptés aux décompositions itératives de type poursuite, appelée un Iteration-Tuned Dictionary (ITD), est présentée. Les ITDs sont structurés en couche, chaque couche se composant d'un ensemble de dictionnaires candidats. Les décompositions itératives basées ITD sont alors réalisées en sélectionnant, à chaque itération i, l'un des dictionnaires de la i-ième couche. Une structure générale des ITDs est proposée, ainsi qu'une variante structurée en arbre appelée Tree-Structured Iteration-Tuned Dictionary (TSITD) et une version contrainte de cette dernière, appelée Iteration-Tuned and Aligned Dictionary (ITAD). Ces structures sont comparées à plusieurs méthodes de l'état de l'art et évaluées dans des applications de débruitage et de compression d'images. Un codec basé sur le schéma ITAD est également présenté et comparé à JPEG2000 dans des évaluations qualitatives et quantitatives. Dans le contexte de l'indexation d'images, un nouveau système de recherche approximative des plus proches voisins est également introduit, qui utilise les représentations parcimonieuses pour réduire la complexité de la recherche. La méthode traite l'instabilité dans la sélection des atomes lorsque l'image est soumise à de faibles transformations affines. Un nouveau système de conditionnement des données est également introduit, permettant de mieux distribuer les données sur la sphère unitaire tout en préservant leurs distances angulaires relatives. Il est montré que cette méthode améliore le compromis complexité/performance de la recherche approximative basée décompositions parcimonieuses

    Tanden filterbank DFT code for bursty erasure correction

    No full text
    Discrete Fourier Transform (DFT) encoding over the real (or complex) field has been proposed as a means to reconstruct samples lost in multimedia transmissions over packet-based networks. A collection of simple sample reconstruction (and error detection) algorithms makes DFT codes an interesting candidate. A common problem with DFT code sample reconstruction algorithms is that the quantization associated with practical implementations results in reconstruction errors that are particularly large when lost samples occur in bursts (bursty erasures).Following a survey of DFT decoding algorithms, we present herein the Tandem Filterbank/DFT Code (TFBD). The TFBD code consists of a tandem arrangement of a filterbank and DFT encoder that effectively creates DFT codes along the rows (temporal codevectors) and columns (subband codevectors) of the frame under analysis. The tandem arrangement ensures that subband codevectors (the frame columns) will be DFT codes, and we show how the temporal codevectors (frame rows) can also be interpreted as DFT codes. All the subband and temporal codevectors can be used to reconstruct samples entirely independently of each other. An erasure burst along a particular codevector can then be broken up by reconstructing some lost samples along the remaining orientation; these samples can then be used as received samples in reconstructing the original codevector, a technique that we refer to as pivoting. Expressions related to the performance of the Tandem Filterbank/DFT (TFBD) code, including an expression for the temporal code reconstruction error and for temporal-to-subband pivoting operations, are derived and verified through simulations. The expressions also prove useful in the selection of the many parameters specifying a TFBD encoder. The design process is illustrated for two sample TFBD codes that are then compared to a benchmark DFT code at the same rate. The results show that the TFBD encoder is capable of reconstruction error improvements that are more than four orders of magnitude better than that of the benchmark DFT code

    Approximate nearest neighbors using sparse representations

    No full text
    International audienceA new method is introduced that makes use of sparse image representations to search for approximate nearest neighbors (ANN) under the normalized inner-product distance. The approach relies on the construction of a new sparse vector designed to approximate the normalized inner-product between underlying signal vectors. The resulting ANN search algorithm shows significant improvement compared to querying with the original sparse vectors. The system makes use of a proposed transform that succeeds in uniformly distributing the input dataset on the unit sphere while preserving relative angular distances

    Image compression using the Iteration-Tuned and Aligned Dictionary

    No full text
    International audienceWe present a new, block-based image codec based on sparse representations using a learned, structured dictionary called the Iteration-Tuned and Aligned Dictionary (ITAD). The question of selecting the number of atoms used in the representation of each image block is addressed with a new, global (image-wide), rate-distorition-based sparsity selection criterion. We show experimentally that our codec outperforms JPEG2000 in both quantitative evaluations (by 0.9 dB to 4 dB) and qualitative evaluations

    Image Compression Using Sparse Representations and the Iteration-Tuned and Aligned Dictionary

    No full text
    International audienceWe introduce a new image coder which uses the Iteration Tuned and Aligned Dictionary (ITAD) as a transform to code image blocks taken over a regular grid. We establish experimentally that the ITAD structure results in lower-complexity representations that enjoy greater sparsity when compared to other recent dictionary structures. We show that this superior sparsity can be exploited successfully for compressing images belonging to specific classes of images (e.g. facial images). We further propose a global rate-distortion criterion that distributes the code bits across the various image blocks. Our evaluation shows that the proposed ITAD codec can outperform JPEG2000 by more than 2 dB at 0:25 bpp and by 0:5 dB at 0:45 bpp, accordingly producing qualitatively better reconstructions
    corecore